\documentclass{article}

\usepackage{iclr2020_conference,times}

% COMMENT for anonymous submission
\def\nonanonymous{}

\ifdefined\nonanonymous
\iclrfinalcopy
\fi

% to avoid loading the natbib package, add option nonatbib:
%     \usepackage[nonatbib]{neurips_2019}

\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography

% Not part of the offical NeurIPS template
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{bm}
\usepackage{mathtools}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{caption}
\usepackage{csquotes}
\usepackage{layouts}
\usepackage{float}
\usepackage{todonotes}
\usepackage{enumitem}

% Algorithms
\usepackage{algorithm}
\usepackage[noend]{algpseudocode}
\algnewcommand{\Let}[2]{\State #1 $\gets$ #2}
\algrenewcommand\Call[2]{\textproc{#1}(#2)}

% Kabel Tables
\usepackage{multirow}
\usepackage{tabu}
\usepackage{longtable}
\captionsetup[table]{skip=5pt}

\let\cite\citep
\title{Neural Arithmetic Units}

% The \author macro works with any number of authors. There are two commands
% used to separate the names and addresses of multiple authors: \And and \AND.
%
% Using \And between authors leaves it to LaTeX to determine where to break the
% lines. Using \AND forces a line break at that point. So, if LaTeX puts 3 of 4
% authors names on the first line, and the last on the second line, try using
% \AND instead of \And before the third author name.

\author{%
  Andreas Madsen \\
  Computationally Demanding \\
  \texttt{amwebdk@gmail.com}
  \And
  Alexander Rosenberg Johansen \\
  Technical University of Denmark \\
  \texttt{aler@dtu.dk} \\
}

\begin{document}

\maketitle

\begin{abstract}

%Learning exact arithmetic operation of real numbers, as part of a neural network, presents a unique challenge. Neural networks can approximate complex functions by learning from labeled data. However, when extrapolating to out-of-distribution samples neural networks often fail. Learning the underlying logic, as opposed to an approximation, is crucial in applications that depends on inferring physical models, comparing, or counting as part of the model.

%Alternative
%What’s the domain?
Neural networks can approximate complex functions, but they struggle to perform exact arithmetic operations over real numbers.
%What’s the issue?
The lack of inductive bias for arithmetic operations leaves neural networks without the underlying logic necessary to extrapolate on tasks such as addition, subtraction, and multiplication.
%What’s your contribution?
We present two new neural network components: the Neural Addition Unit (NAU), which can learn exact addition and subtraction; and the Neural Multiplication Unit (NMU) that can multiply subsets of a vector.
%Why is it novel?
The NMU is, to our knowledge, the first arithmetic neural network component that can learn to multiply elements from a vector, when the hidden size is large.
%What’s interesting about it?
The two new components draw inspiration from a theoretical analysis of recently proposed arithmetic components.
We find that careful initialization, restricting parameter space, and regularizing for sparsity is important when optimizing the NAU and NMU.
%How does it perform?
Our proposed units NAU and NMU, compared with previous neural units, converge more consistently, have fewer parameters, learn faster, can converge for larger hidden sizes, obtain sparse and meaningful weights, and can extrapolate to negative and small values.\ifdefined\nonanonymous\footnote{Implementation is available on GitHub: \url{https://github.com/AndreasMadsen/stable-nalu}.}\fi
\end{abstract}

\input{sections/introduction}
\input{sections/methods}
\input{sections/related-work}
\input{sections/results}
\input{sections/conclusion}

\clearpage
\ifdefined\nonanonymous
\subsubsection*{Acknowledgments}
We would like to thank Andrew Trask and the other authors of the NALU paper, for highlighting the importance and challenges of extrapolation in Neural Networks. We would also like to thank the students Raja Shan Zaker Kreen and William Frisch Møller from The Technical University of Denmark, who initially showed us that the NALU do not converge consistently. 

This research is funded by the Innovation Foundation Denmark through the DABAI project.
\fi

\bibliographystyle{iclr2020_conference}
\bibliography{bibliography}

\newpage
\appendix
\input{appendix/gradient-derivatives}
\clearpage
\input{appendix/moments}
\clearpage
\input{appendix/simple-function-task}
\clearpage
\input{appendix/sequential-mnist}
\clearpage
%\input{appendix/nalu-author-comments}
%\clearpage

\end{document}